Skip to content

docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)#213

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/splat-native-ultrasound-v1-fixes
Jun 5, 2026
Merged

docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)#213
AdaWorldAPI merged 2 commits into
masterfrom
claude/splat-native-ultrasound-v1-fixes

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

Summary

Follow-up to #212 (merged). Addresses 2 codex findings on the W1c primitive signatures in the splat-native SIMD substrate plan.

Fixes

Fix 1 — batched_opacity_blend ray segmentation (codex P1)

The original signature took a single flat sorted_amplitudes slice and emitted a single out_alpha[ray] — but a renderer composites N independent view rays per frame, each with its own front-to-back-sorted Gaussian sequence. Without per-ray boundaries the implementation could not know which Gaussians belong to which output pixel.

Adds a CSR-style ray_offsets: &[u32] prefix-sum (length n_rays + 1) that segments the flat amplitude buffer into per-ray ranges:

pub fn batched_opacity_blend(
    sorted_amplitudes: &[f32],    // flat; all rays' samples concatenated
    ray_offsets: &[u32],          // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1])
    opacity_lut: &[u8; 256],
    out_alpha: &mut [u8],         // length = n_rays
);

Contract: ray_offsets[0] == 0; ray_offsets[n_rays] == sorted_amplitudes.len(); empty ray (ray_offsets[r] == ray_offsets[r+1]) yields α = 0; rays are independent (outer loop parallelizable); per-frame quantization is caller-side.

Tests added: single-ray + multi-ray reference comparison, empty-ray boundary case, multi-ray independence (concatenated rays ≡ separate calls), ray_offsets invariant debug-asserts, SIMD parity.

Fix 2 — batched_mahalanobis Cholesky scratch buffer (codex P2)

The implementation note said L was "heap-free via stack or caller-provided scratch" but the public signature had no scratch parameter. At the documented N = 1_000_000 bench size, the Cholesky cache is 6 * N * size_of::<f32>() = 24 MiB — not stack-feasible.

Adds explicit cholesky_scratch: &mut [f32] parameter (length 6*N):

pub fn batched_mahalanobis(
    query_xyz: &[f32],            // length 3*M
    mu_xyz: &[f32],               // length 3*N
    sigma_packed: &[f32],         // length 6*N (upper-triangle Σ per Gaussian)
    cholesky_scratch: &mut [f32], // length 6*N (caller-provided; holds packed L per Gaussian)
    out_dist_sq: &mut [f32],      // length M*N (row-major)
);

Sizing guidance: N ≤ 8192 MAY use stack-resident; N > 8192 MUST allocate once at engine init and re-use across frames; the function MUST NOT allocate internally. Matches the splat-fit engine + registration loop pattern (allocated once per SplatFitActor mailbox at boot).

What's NOT in this PR

  • Source code: still none. Plan-spec only.
  • The W1c primitive-addition contract (all three backends mandatory, parity tests gate, VPABSB-correction-style degenerate-input documentation) is unchanged.

Test plan

  • Codex P1 (ray segmentation) — added per-ray offset + 3 new tests.
  • Codex P2 (Mahalanobis scratch) — added cholesky_scratch parameter + sizing note + zero-allocation contract.
  • Codex re-review on this PR.

Companion follow-up PRs (the four-PR fix arc)

Repo Branch Fixes
lance-graph claude/splat-native-ultrasound-v1-fixes 5 fixes — StepDomain alignment with shipped contract, STATUS_BOARD canonical-status invariant + Sprint column, pose_se3 16→24, NR-SPLAT-PHI normative rule, MD040 fence-tag lint, IVD-MDR→MDR Annex VIII (#471 follow-up)
ndarray claude/splat-native-ultrasound-v1-fixes 2 fixes — batched_opacity_blend ray segmentation (CSR-style ray_offsets), batched_mahalanobis Cholesky-scratch buffer (#212 follow-up)
MedCare-rs claude/splat-native-ultrasound-v1-fixes 1 fix — pose_se3 16→24 (#163 follow-up)
OGAR claude/splat-native-customer-fixes (PR #35) 1 fix — IVD-MDR → MDR Annex VIII Rule 11 (#34 follow-up)

All four cross-reference each other. No source code in any of the four PRs — design-spec / handover updates only.

Authored by session claude/lance-graph-ontology-review-Pyry3.

Follow-up to PR #212 (merged). Addresses two codex review findings.

## Fix 1 — `batched_opacity_blend` needs ray segmentation (codex P1)

The original signature took a single flat `sorted_amplitudes` slice
and emitted a single `out_alpha[ray]` — but a renderer composites N
independent view rays per frame, each with its own front-to-back-
sorted Gaussian sequence. Without per-ray boundaries the
implementation could not know which Gaussians belong to which output
pixel, so it would either composite the same global sequence for
every ray or guess boundaries outside the API.

Adds a CSR-style `ray_offsets: &[u32]` prefix-sum (length `n_rays + 1`)
that segments the flat amplitude buffer into per-ray ranges.
Documented contract:

- `ray_offsets[0] == 0` and `ray_offsets[n_rays] == sorted_amplitudes.len()`
- Empty ray (`ray_offsets[r] == ray_offsets[r+1]`) yields `out_alpha[r] = 0`
- Rays are independent (no cross-ray data dependence) — outer loop
  is trivially parallelizable
- Per-frame amplitude quantization is caller-side; `opacity_lut` is
  a frame-global constant for that pass

Adds three new tests:
- Multi-ray independence (concatenated rays match per-ray calls)
- Empty-ray boundary case (→ α = 0)
- `ray_offsets` invariant debug-asserts

## Fix 2 — `batched_mahalanobis` needs scratch buffer for Cholesky cache (codex P2)

The original implementation note said L was "heap-free via stack or
caller-provided scratch" but the public signature had no scratch
parameter. At the documented `N = 1_000_000` bench size, the
Cholesky cache is `6 * N * size_of::<f32>() = 24 MiB` — not
stack-feasible. The function would either have to allocate
internally (breaking the zero-allocation contract) or recompute
factors per query (breaking the throughput contract).

Adds explicit `cholesky_scratch: &mut [f32]` parameter (length `6*N`)
with documented sizing guidance:

- `N ≤ 8192` MAY use a stack-resident buffer
- `N > 8192` MUST allocate once at engine init and re-use across frames
- The function MUST NOT allocate internally

Matches the `splat-fit` engine and registration-loop pattern where
the scratch is allocated once per `SplatFitActor` mailbox at boot.

## What's NOT in this PR

- Source code: still none. Plan-spec only.
- The W1c primitive-addition contract (all three backends mandatory,
  parity tests gate, VPABSB-correction-style degenerate-input
  documentation) is unchanged — the fix updates the two signatures
  but not the testing or backend invariants.

## Test plan

- [x] Codex P1 (ray segmentation) — added per-ray offset + 3 new tests
      to the contract.
- [x] Codex P2 (Mahalanobis scratch) — added `cholesky_scratch`
      parameter + sizing note + zero-allocation contract.
- [x] Signatures rebalanced (each line-broken with one arg per line +
      sized comments) for readability.
- [ ] Codex re-review on this PR.
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 15 minutes and 7 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d39e15cc-db31-4036-a3cf-4103cbdb34d5

📥 Commits

Reviewing files that changed from the base of the PR and between 481205a and 3c78901.

📒 Files selected for processing (1)
  • .claude/plans/splat-native-ultrasound-simd-substrate-v1.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c1fb05d81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

query_xyz: &[f32], // length 3*M
mu_xyz: &[f32], // length 3*N
sigma_packed: &[f32], // length 6*N (upper-triangle Σ per Gaussian)
cholesky_scratch: &mut [f32], // length 6*N (caller-provided; holds packed L per Gaussian)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the Mahalanobis skeleton signature in sync

The detailed spec now requires cholesky_scratch, but the module skeleton at the top of this same plan still advertises batched_mahalanobis(query_xyz, mu_xyz, sigma_packed, out_dist_sq) without any scratch buffer. Since this document is the handoff for implementing src/simd_splat.rs, an implementer following the skeleton would recreate the exact no-scratch API this change is meant to eliminate, leaving the zero-allocation contract unenforceable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 3c78901 on the same PR branch before merge — the module skeleton at line 50 now shows the cholesky_scratch: &mut [f32] parameter with the inline note pointing to §4.2 for sizing + the "function MUST NOT allocate" contract. Verified on merged master tip e2ec430:

pub fn batched_mahalanobis(
    query_xyz: &[f32],            // M × 3 query points
    mu_xyz: &[f32],               // N × 3 Gaussian centroids
    sigma_packed: &[f32],         // N × 6 packed Σ
    cholesky_scratch: &mut [f32], // N × 6 — caller-provided packed-L scratch (24 MiB @ N=1M); function MUST NOT allocate (see §4.2)
    out_dist_sq: &mut [f32],      // M × N output (squared Mahalanobis)
);

Closing — skeleton and §4.2 detailed spec are now in sync.

pub fn batched_opacity_blend(
sorted_amplitudes: &[f32], opacity_lut: &[u8; 256], out_alpha: &mut [u8],
sorted_amplitudes: &[f32], // flat; contains all rays' samples concatenated
ray_offsets: &[u32], // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Update the opacity skeleton to include ray offsets

The per-primitive spec now makes ray_offsets mandatory for segmenting concatenated rays, but the module skeleton above still shows batched_opacity_blend(sorted_amplitudes, opacity_lut, out_alpha) with no segmentation parameter. If downstream code is implemented from that skeleton, it will still have no way to map the flat amplitude buffer to independent output rays, preserving the API bug this follow-up is supposed to fix.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 3c78901 on the same PR branch before merge — the module skeleton now shows the CSR-style ray_offsets: &[u32] parameter (length n_rays + 1) with the inline note pointing to §4.3 for the segmentation contract; out_alpha length corrected to n_rays; sorted_amplitudes re-described as flat concatenation. Verified on merged master tip e2ec430:

pub fn batched_opacity_blend(
    sorted_amplitudes: &[f32],    // flat; all rays' samples concatenated (front-to-back per ray)
    ray_offsets: &[u32],          // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1]) (see §4.3)
    opacity_lut: &[u8; 256],      // amplitude → opacity LUT
    out_alpha: &mut [u8],         // length = n_rays — composited alpha per ray
);

Closing — skeleton and §4.3 detailed spec are now in sync.

…213)

Follow-up to codex review on PR #213. The per-primitive specs at §4.2
(`batched_mahalanobis`) and §4.3 (`batched_opacity_blend`) were
correctly updated in #213, but the module skeleton at the top of the
same file (the `pub fn` declarations in the `src/simd_splat.rs` sketch
around line 50) still advertised the OLD no-scratch / no-ray-offsets
signatures.

Since this document is the implementation handoff for
`src/simd_splat.rs`, an implementer scanning the skeleton first would
recreate the exact APIs #213 was meant to eliminate, leaving the
zero-allocation contract (Mahalanobis) and per-ray segmentation
contract (opacity blend) unenforceable.

## Fix

Updates the two skeleton signatures to match §4.2 and §4.3 exactly,
with cross-references to the per-primitive sections:

- `batched_mahalanobis` — adds `cholesky_scratch: &mut [f32]` (length
  N × 6); inline comment cites §4.2 sizing guidance and the
  "function MUST NOT allocate" contract.
- `batched_opacity_blend` — adds `ray_offsets: &[u32]` (length
  n_rays + 1, CSR-style); `sorted_amplitudes` re-described as flat
  concatenation; `out_alpha` length now `n_rays` (not "per pixel");
  inline comment cites §4.3 segmentation contract.

The two implementations of the §4.2 / §4.3 detailed specs are
unchanged in this PR — this commit only syncs the up-front skeleton
so the two views of the API agree.

## Test plan

- [x] Skeleton + §4.2 + §4.3 now show identical parameter lists.
- [x] Comments on the new parameters cite the section that owns the
      detailed contract (so an implementer who reads the skeleton
      first gets pointed to the contract section).
- [ ] Codex re-review on this PR.
@AdaWorldAPI AdaWorldAPI merged commit e2ec430 into master Jun 5, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants